LabelFlow: Exploiting Workflow Provenance to Surface Scientific Data Provenance
نویسندگان
چکیده
Provenance traces captured by scientific workflows can be useful for designing, debugging and maintenance. However, our experience suggests that they are of limited use for reporting results, in part because traces do not comprise domain-specific annotations needed for explaining results, and the black-box nature of some workflow activities. We show that by basic mark-up of the data processing within activities and using a set of domain specific label generation functions, standard workflow provenance can be utilised as a platform for the labelling of data artefacts. These labels can in turn aid selection of data subsets and proxy for data descriptors for shared datasets.
منابع مشابه
Abstract Provenance Graphs: Anticipating and Exploiting Schema-Level Data Provenance
Provenance Graphs: Anticipating and Exploiting Schema-Level Data Provenance Daniel Zinn Bertram Ludäscher {dzinn,ludaesch}@ucdavis.edu Abstract. Provenance graphs capture flow and dependency information recorded during scientific workflow runs, which can be used subsequently to interpret, validate, and debug workflow results. In this paper, we propose a new concept, called abstract provenance g...
متن کاملProvenance Collection Support in the Kepler Scientific Workflow System
In many data-driven applications, analysis needs to be performed on scientific information obtained from several sources and generated by computations on distributed resources. Systematic analysis of this scientific information unleashes a growing need for automated data-driven applications that also can keep track of the provenance of the data and processes with little user interaction and ove...
متن کاملProvenance in Scientific Workflow Systems
The automated tracking and storage of provenance information promises to be a major advantage of scientific workflow systems. We discuss issues related to data and workflow provenance, and present techniques for focusing user attention on meaningful provenance through “user views,” for managing the provenance of nested scientific data, and for using information about the evolution of a workflow...
متن کاملSGProv: Summarization Mechanism for Multiple Provenance Graphs
Scientific workflow management systems (SWfMS) are powerful tools in the automation of scientific experiments. Several workflow executions are necessary to accomplish one scientific experiment. Data provenance, typically collected by SWfMS during workflow execution, is important to understand, reproduce and analyze scientific experiments. Provenance is about data derivation, thus it is typicall...
متن کاملA Logic Programming Approach to Scientific Workflow Provenance Querying
Scientific workflows have become increasingly important for enabling and accelerating many scientific discoveries. More and more scientists and researchers rely on workflow systems to integrate and structure various local and remote heterogeneous data and services to perform in silico experiments. In order to support understanding, validation, and reproduction of scientific results, provenance ...
متن کامل